Estimating the historical and future probabilities of large terrorist events 
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Quantities with right-skewed distributions are ubiquitous in complex social systems, including 
political conflict, economics and social networks, and these systems sometimes produce extremely 
large events. For instance, the 9/11 terrorist events produced nearly 3000 fatalities, nearly six 
times more than the next largest event. But, was this enormous loss of life statistically unlikely 
given modern terrorism's historical record? Accurately estimating the probability of such an event 
is complicated by the large fluctuations in the empirical distribution's upper tail. We present a 
generic statistical algorithm for making such estimates, which combines semi-parametric models of 
tail behavior and a non-parametric bootstrap. Applied to a global database of terrorist events, we 
estimate the worldwide historical probability of observing at least one 9/11-sized or larger event 
since 1968 to be 11-35%. These results are robust to conditioning on global variations in economic 
development, domestic versus international events, the type of weapon used and a truncated history 
that stops at 1998. We then use this procedure to make a data-driven statistical forecast of at least 
one similar event over the next decade. 



The September 11th terrorist attacks were the largest 
such events in modern history, killing nearly 3000 peo- 
ple p], [1] . Given their severity, should these attacks be 
considered statistically unlikely or even outliers? What is 
the likelihood of another September llth-sized or larger 
terrorist event, worldwide, over the next decade? 

Accurate answers to such questions would shed new 
light both on the global trends and risks of terrorism 
and on the global social and political processes that gen- 
erate these rare events which depends in part on 
determining whether the same processes generate both 
rare, large events and smaller, more common events. In- 
sights would also provide objective guidance for our long- 
term expectations in planning, response and insurance 
efforts [(| 0], and for estimating the likelihood of even 
larger events, including mass-casualty chemical, biologi- 
cal, radioactive or nuclear (CBRN) events 0, Q . 

The rarity of events like 9/11 poses two technical 
problems: (i) we typically lack quantitative mechanism- 
based models with demonstrated predictive power at the 
global scale (which is particularly problematic for CBRN 
events) and (ii) the global historical record contains few 
large events from which to estimate mechanism-agnostic 
statistical models of large events alone. That is, the rar- 
ity of big events implies large fluctuations in the dis- 
tribution's upper tail, precisely where we wish to have 
the most accuracy. These fluctuations can lead to poo r 
out-of-sample predictive power in conflict (see [l34l3) 
and can complicate both selecting the correct model of 
the tail's structure and accurately estimating its param- 
eters pl|- Misspecification can lead to severe underes- 



timates of the true probability of large events, e.g., in 
classical financial risk models [17l Il8j. 

Little research on terrorism has focused on directly 
modeling the number of deaths ( "severity" ) 1 in individ- 
ual terrorist events When deaths are considered, 
they are typically aggregated and used as a covariate to 
understand other aspects of terrorism, e.g., trends over 
time plU [20|. the when, where, what, how and why of 
the resort to terrorism [2ll - |23j ]. differences between or- 
ganizations [3 , or the incident rates or outcomes of 
events I19i. l25| . Such efforts have used time series analy- 
sis p^l. |20L |25|. qualitative models or human expertise 
of specific scenarios, actors, targets or attack s 126 1 o r 
quantitative models based on factor analysis [27l 28j. 
social networks 0, [3(3] or formal adversarial interac- 
tions [iH [3ll HI)]- Most of this work focuses on mod- 
eling central tendencies, treats large events like 9/11 as 
outliers, and says little about their quantitative proba- 
bility [33] or their long-term hazard. 

Here, we describe a statistical algorithm for estimating 
the probability of large events in complex social systems 
in general, and in global terrorism in particular. Making 
only broad-scale and long-term probabilistic estimates, 
our approach is related to techniques used in seismology, 
forestry, hydrology and natural disaster insurance to es- 
timate the probabilities of individual rare catastrophic 
events @, 0, I34l - l37| . Our approach combines maximum- 
likelihood methods, multiple models of the distribution's 
tail, and computational techniques to account for both 
parameter and model uncertainty. It provides a quan- 



1 Other notions of event "size" or severity, which we do not explore 
here, might be the economic cost, number injured, political im- 
*Electronic address: aaron.clauset@colorado.edu pact, etc. To the extent that such notions may be quantitatively 

^Electronic address: rwoodard@ethz.ch measured, our algorithm could also be applied to them. 
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titativc estimate of the probability, with uncertainty, of 
a large event. The algorithm also naturally generalizes 
to include certain event covariates, which can shed addi- 
tional light on the probability of large events of different 
types. 

Using this algorithm to analyze a database of 13,274 
deadly terrorist events worldwide from 1968-2007, we es- 
timate the global historical probability of at least one 
9/11-sized or larger terrorist event over this period to be 
roughly 11-35%. Furthermore, we find the non-trivial 
magnitude of this historical probability to be highly ro- 
bust, a direct consequence of the highly righ t-skewed or 
"heavy-tailed" structure of event sizes [33|]. Thus, an 
event of size or severity of the September 11th terrorist 
attacks, compared to the global historical record, should 
not be considered a statistical outlier or even statistically 
unlikely. Using three potential scenarios for the evolu- 
tion of global terrorism over the next decade, we then 
estimate the worldwide future probability of a similarly 
large event as being not significantly different from the 
historical level. We close by discussing the implications 
for forecasting large terrorist events in particular and for 
complex social systems in general. 



I. ESTIMATING THE PROBABILITY OF A 
LARGE EVENT 

The problem of estimating the probability of some ob- 
served large event is a kind of tail-fitting problem, in 
which we estimate parameters for a distributional model 
using only the largest several observations. This task 
is distinct from estimating the distribution of maxima 
within a sample @, 0] , and is more closely related to the 
peaks-over-threshold literature in hydrology, seismology, 
forestry, finance and insurance [(I I34T439} . Here, we aim 
specifically to deal with several sources of uncertainty in 
this task: uncertainty in the location of the tail, uncer- 
tainty in the tail's true structure, and uncertainty in the 
model parameters. 

Our approach is based on three key insights. First, be- 
cause we are interested only in rare large events, we need 
only model the structure of the distribution's right or 
upper tail, which governs their frequency. This replaces 
the difficult problem of modeling both the distribution's 
body and tail [1, 0, HI] with the less difficult problem 
of identifying a value x m i n above which a model of the 
tail alone fits well. 2 That is, choose some cc m i n and a tail 
model Pr(a; | 9, x m ; n ) defined on x £ [x m i n ,oo). We will 
revisit the problem of choosing cc m i n below. 

Second, in complex social systems, the correct tail 
model is typically unknown and a poor choice may lead 
to severe misestimates of the true probability of a large 



event. We control for this model uncertainty by consider- 
ing multiple tail models. Given these models and a com- 
mon choice of :Emin, we use a likelihood ratio test to iden- 
tify and discard the statistically implausible ones [l6| . In 
principle, the remaining models could be averaged to pro- 
duce a single estimate with confidence intervals [4(j], e.g., 
to aid decision makers. We return to this point in more 
detail below. 

Finally, large fluctuations in the distribution's upper 
tail occur precisely where we wish to have the most 
accuracy, leading to parameter uncertainty. Using a 
non-parametric bootstrap 41| to simulate the genera- 
tive process of event sizes, we incorporate the empirical 
data's inherent variability into the estimated parameters, 
weight models by their likelihood under the bootstrap 
distribution and construct extreme value confidence in- 
tervals [37j . 

This combination of techniques provides a statistically 
principled and data-driven solution for estimating the 
probability of observing rare events in empirical data 
with unknown tail structure. If such an event is ob- 
served, the algorithm provides a measure of whether its 
occurrence was in fact unlikely, given the overall struc- 
ture of the distribution's tail. For instance, if the esti- 
mated probability is negligible (say p < 0.01), the event 
may be judged statistically unlikely. When several tail 
models are plausible and agree that the probability is 
away from p = 0, the event can be judged to be sta- 
tistically likely, despite the remaining uncertainty in the 
tail's structure. 



A. The method 

Our goal is to estimate the probability that we would 
observe at least I "catastrophic" events of size x or 
greater in an empirical sample. 3 In principle, any size 
x and any value I may be chosen, but in practice we typ- 
ically choose x as the largest (and thus rarest) event in 
the empirical data and set £ = 1. To ensure that our 
estimate is meaningful from a historical perspective, we 
remove the catastrophic event (s) from the empirical sam- 
ple before applying the algorithm. Here we describe the 
method in terms of univariate distributions, but its gen- 
eralization to certain covariates is straightforward (see 
Appendix IC 3 c[) . 

Let Pi(x | 9,x m i n ) denote a particular tail model with 
parameters 9, let {xi} denote the n empirical event sizes 
(sans the catastrophic events), and let Y — {y 3 } be a 
bootstrap of these data (n samples drawn from {x^ with 
replacement). To begin, we assume a fixed x m j n , the 



The notation a; m i n should not be confused with the first order 
statistic, xm) = min; X{. 



3 Consider events to be generated by a kind of marked point pro- 
cess |4'J . where marks indicate either the event's severity or that 
it exceeded some threshold x. Although we assume the num- 
ber of marks n to be fixed, this could be relaxed to incorporate 
additional uncertainty into the algorithm's output. 
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smallest value for which the tail model holds, and later 
describe the generalization to variable x m - m . 

The fraction of empirical events with values in the tail 
region is ptaSX = #{2^ > x m i n }/n, and in each bootstrap 
the number is a binomial random variable with probabil- 
ity 

"-tail ~ Binomial(n,ptaii) ■ (1) 

The maximum likelihood estimate 9 is a deterministic 
function of the portion of Y above x m ; n , which we denote 

0{Y : Xmin) • 

Given that choice, the probability under the fitted 
model that not one of n' tail = 1 + n ta ii events is at least 
as big as x is 

F{x I 6{Y, x min ))"-" = f / Pr(y | &, x min )y J 

(2) 

Thus, 1 — F(x I 9(Y, x m i n ))" tail is the probability that at 
least one event is of catastrophic size. Because the boot- 
strap Y is itself a random variable, to derive the marginal 
probability of observing at least one catastrophic event, 
we must integrate the conditional probability over the 
domain of the bootstrap distribution: 

p(n ta n,0) =p(n taih Y) = 

r / \ ntail 

/ Y x . ..y^ (l-F(x;9(Y 7 x min )) n ^j ]J r( Vl \ n tail ) 

(3) 

The trailing product series here is the probability of 
drawing the specific sequence of values yi, . . . 2/n tail from 
the fixed bootstrap distribution r. Finally, the total 
probability p of at least one catastrophic event is given 
by a binomial sum over this equation. 4 

When the correct value cc m i n is not known, it must 
be estimated jointly with 9 on each bootstrap. Max- 
imum likelihood cannot be used for this task, because 
x m - m truncates Y. Several principled methods for auto- 
matically choosing x m ; n exist, e.g., [l~6l l33l I3TL |43T - |4o| . So 
long as the choice of x m - m is also a deterministic function 
of Y, the above expression still holds. Variation in x m ; n 
across the bootstraps, however, leads to different num- 
bers of observations ntail in the tail region. The binomial 
probability ptail is then itself a random variable deter- 
mined by Y, and ntail is a random variable drawn from 
a mixture of these binomial distributions. 



4 We may calculate p in either of two ways: (i) we draw ntail 
events from a tail model alone, or (ii) we draw n events from a 
conditional model, in which the per-event probability is q(x) = 
Pr(X >x\X> x milL ) Pr(X > x min ) = p tail (l - F(x | 9, x min )). 
When the probability of a catastrophic event is small, these cal- 
culations yield equivalent results. 



Analytically completing the above calculation can be 
difficult, even for simple tail models, but it is straightfor- 
ward to estimate numerically via Monte Carlo: 

1. Given the n empirical sizes, generate Y by draw- 
ing y,j, j — uniformly at random, with 
replacement, from the observed {x{\ (sans the I 
catastrophic events). 

2. Jointly estimate the tail model's parameters and 
x m in on Y, and compute ntail = #{yj > x min } (see 
Appendix |A")) . 

3. Set p = 1-F(x; 9) f + n ^ 1 , the probability of observ- 
ing at least I catastrophic events under this boot- 
strap model. 

Averaging over the bootstraps yields the estimated prob- 
ability p = (p) of observing at least I catastrophic-sized 
events. The convergence of p is guaranteed so long as 
the number of bootstraps ( step [H) tends to infinity [41| . 
Confidence intervals on p [33, [4JJ may be constructed 
from the distribution of the p values. If the tail model's 
cdf F(x; 9) in step [3] cannot be computed analytically, it 
can often be constructed numerically; failing that, p may 
always be estimated by sampling directly from the fitted 
model. 



B. Model comparison and model averaging 

In complex social systems, we typically do not know 
a priori which particular tail model is correct, and the 
algorithm described above will give no warning of a bad 
choice (but see Qil)- This issue is partly mitigated by 
estimating ir m i n , which allows us to focus our modeling 
efforts on the upper tail alone. But, without additional 
evidence of the model's statistical plausibility, the esti- 
mate p should be treated as provisional. 

Comparing the results from multiple tail models pro- 
vides a test of robustness against model misspecification, 
e.g., agreement across models that p > 0.01 strengthens 
the conclusion that the event is not statistically unlikely. 
However, wide confidence intervals and disagreements on 
the precise probability of a large event reflect the inherent 
difficulty of identifying the correct tail structure. 

To select reasonable models to compare, standard 
model comparison approaches may be used, e.g., a fully 
Bayesian approach [47j |. cross-validation 48], or mini- 
mum description length [49[ . Here, we use a goodness-of- 
fit test to establish the plausibility of the power-law dis- 
tribution [l(| and Vuong's likelihood ratio test [l6|,[5(| to 
compare it with alternatives. This approach has the ad- 
vantage that it can fail to choose one model over another 
if the difference in their likelihoods statistically insignifi- 
cant, given the data. 

In some circumstances, we may wish to average the 
resulting models to produce a single estimate with con- 
fidence intervals, e.g., to aid decision makers. However, 
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severity, x (deaths) severity, x (deaths) 



FIG. 1: Empirical severity distribution with 100 bootstrap power-law models for (a) fixed s; m i n = 10 and (b) estimated a; m i n - 
Overprinting illustrates the ensemble of estimated models (dashed lines show 90% CI on a) and the inherent uncertainty in 
the tail structure. Insets show the 90% confidence intervals for the estimated probability of observing at least one 9/11-sized 
event. 



averaging poses special risks and technical problems for 
estimating the probability of large events. For instance, 
traditional approaches to averaging can obscure the in- 
herent uncertainty in the tail's structure and can pro- 
duce spuriously precise confidence intervals [51j; a 
Bayesian approach would be inconsistent with our exist- 
ing framework; and an appropriate frequentist framework 
is not currently available, although one may be possible 
using insights from [52]. 

Thus, in our application below, we elect not to average 
and instead we present results for each model. Even with- 
out averaging, however, several valuable insights may be 
drawn. 



C. Tests of the method's accuracy 

To test the accuracy of our estimation algorithm, we 
examine its ability to recover the true probability of a 
rare event from synthetic data with known structure. To 
generate these synthetic data, we use the power-law dis- 
tribution 

Pr(») oc y- a , (4) 

where a > 1 is the "scaling" parameter and y > x min > 0. 
When a < 2, this distribution exhibits infinite variance 
and produces extreme fluctuations in the upper tail of 
finite-size samples. By defining a catastrophic event x 
to be the largest generated event within the n synthetic 
values, we make the test particularly challenging because 
the largest value exhibits the greatest fluctuations of all. 
Detailed results are given in Appendix [B] 

We find that despite the large fluctuations generated 
by the power-law distribution, the algorithm performs 
well: the mean absolute error (\p — p\) is small even 
for samples with less than 100 events, and decays like 



0(n -1 / 3 ). A small absolute deviation, however, may be 
an enormous relative deviation, e.g., if the true probabil- 
ity tends to zero or one. Our algorithm does not make 
this type of error: the mean ratio of the estimated and 
true probabilities (p/p) remains close to 1 and thus the 
estimate is close in relative terms, being only a few per- 
cent off for n > 100 events. 



II. HISTORICAL PROBABILITY OF 9/11 

Having described our statistical approach, we now 
use it to estimate the historical probability of observ- 
ing worldwide at least one 9/11-sized or larger terrorist 
event. 

Global databases of terrorist events show that event 
severities (number of deaths) are highly right-skewed 
or "heavy tailed" [J 0. We use the RAND-MIPT 
database [l|, which contains 13,274 deadly events world- 
wide from 1968-2007. The power law is a statistically 
plausible model of this distribution's tail, with a = 
2.4 ± 0.1, for x > £ min = 10 [jj, HI- A goodness-of- 
fit test fails to reject this model of tail event severities 
(p = 0.40 ±0.03 via Monte Carlo [16]), implying that the 
deviations between the power-law model and the empir- 
ical data are indistinguishable from sampling noise. 

This fact gives us license to treat as iid random vari- 
ables the severity of these events. This treatment does 
force a particular and uncommon theoretical perspec- 
tive on terrorism, in which a single global "process" pro- 
duces events, even if the actions of individual terrorists 
or terrorist organizations are primarily driven by local 
events. This perspective has much in common with sta- 
tistical physics, in which particular population-level pat- 
terns emerge from a sea of individual interactions. We 
discuss limitations of this perspective in Section IIV1 
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severity, x (deaths) Probabilityfat least one catastrophic event, 1 968-2007) 

FIG. 2: (a) Empirical event severities with 100 bootstrap models for the power-law, log-normal and stretched exponential tail 
models, with x m - m = 10 fixed, (b) Bootstrap distributions of p for each model, with overall estimates (Table [T| given by dashed 
lines. 



Past work shows that this apparent power-law pattern 
in global terrorism is remarkably robust. Although the 
estimated value of a varies somewhat with time [331 ] , the 
power-law pattern itself seems to persist over the 40 year 
period despite large changes in the international system. 
It also appears to be independent of the type of weapon 
(explosives, firearms, arson, knives, etc.), the emergence 
and increasing frequency of suicide attacks, the demise of 
many terrorist organizations, the economic development 
of the target country [33[ and organizational covariates 
like size (number of personnel), age and experience (total 
number of attacks) [53j. 



Comparing the power-law tail model against log- 
normal and stretched exponential (Weibull) distribu- 
tions, via a likelihood ratio test, yields log-likelihood ra- 
tios of 11 = -0.278 = 0.78) and 0.772 (p = 0.44), 
respectively [Iff. However, neither of these values is sta- 
tistically significant, as indicated by the large p- values 
for a test against 1Z = 0. Thus, while the power-law 
model is plausible, so too are these alternatives. This 
ambiguity illustrates the difficulty of correctly identify- 
ing the tail's structure and reinforces the need to use 
multiple tail models in estimating the likelihood of a rare 
event like 9/11. Furthermore, it implies that slight visual 
deviations in the empirical distribution's upper tail (see 
Fig. [T|) should not be interpreted as support either for or 
against any of these models. In what follows, we consider 
estimates derived from all three. 



To apply our algorithm to this problem, we must make 
several choices. For consistency with past work on the 
frequency of severe terrorist events [l|| [33j], we choose 
a^min automatically by minimizing the Kolmogorov- 
Smirnov goodness-of-fit statistic between the tail model 



and the truncated empirical data. 5 We use the discrete 
power-law distribution as our tail model (which implies 
x m in is also discrete; see Appendix [3} and compare its 
estimates to those made using log-normal and stretched 
exponential models. To avoid the problem of choosing an 
appropriate event count distribution, we keep the number 
of events n fixed. 

Finally using the RAND-MIPT event data (other 
sources [2[ yield similar results; see Appendix IC 2ft , we 
define x > 2749 to be a "catastrophic" event — the re- 
ported size of the New York City 9/11 events. 6 Remov- 
ing this event from the empirical data leaves the largest 
event as the 14 August 2007 coordinated truck bomb- 
ing in Sinjar, Iraq, which produced approximately 500 
fatalities. To illustrate the robustness of our results, we 
consider estimates derived from fixed and variable a; m i n 
and from our three tail models. We also analyze the im- 
pact of covariates like domestic versus international, the 
economic development of the target country and the type 
of weapon used. 



A. Uncertainty in the scaling parameter 

Let x m i n = 10 be fixed. Figure QJs shows 100 of the 
fitted bootstrap models, illustrating that by accounting 
for the uncertainty in a, we obtain an ensemble of tail 



5 Ref. [lfjl provides a thorough motivation of this strategy. Briefly, 
the KS statistic will be large either when x m j n is too small (in- 
cluding non-power-law data in the power-law fit) or when a; m i n 
is too large (when sample size is reduced and legitimately power- 
law data thrown out), but will be small between these two cases. 

6 Official sources differ slightly on the number killed in New York 
City. Repeating our analyses with other reported values does 
not significantly change our estimates. 
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est. Pr(a; > 2749) est. prob. p, 90% CI 

tail model parameters per event, q(x) 1968-2007 (bootstrap) 

power law (1) Pr(a), x min = 10 0.0000270200 (L299 [0.203, 0.405] 

power law (2) Pr(a, x min ) 0.0000346345 0.347 [0.182,0.669] 

stretched exp. Pr(/3, A), K m i n = 10 0.0000156780 0.187 [0.115,0.272] 

log-normal Pr(ft, a), a min = 10 0.0000090127 0.112 [0.063,0.172] 

TABLE I: Estimated per-event and worldwide historical probabilities for at least one catastrophic event over the period 
1968-2007, for four tail models. 



models and thus an ensemble of probability estimates 
for a catastrophic-sized event. The bootstrap parameter 
distribution Pr(d) has a mean (a) = 2.40, which agrees 
with the maximum likelihood value a = 2.4 [jq . 

To estimate the historical probability of 9/11, we use 
10,000 bootstraps with cc min fixed. Letting p denote the 
overall probability from the algorithm, we find p = 0.299, 
with 90% confidence intervals of [0.203, 0.405] (Fig. 
inset), or about a 30% chance over the 1968-2007 period. 

An event that occurs with probability 0.299 over 40 
years is not a certainty. However, for global terrorism, 
this value is uncomfortably large and implies that, given 
the historical record, the size of 9/11 should not be con- 
sidered a statistical fluke or outlier. 



B. Uncertainty in the tail location 

A fixed choice of x m j n underestimates the uncertainty 
in p due to the tail's unknown structure. Jointly esti- 
mating a and x m i n yields similar results, but with some 
interesting differences. Figure [If shows 100 of the boot- 
strap models. The distribution of x m i n is concentrated at 
^min = 9 or 10 (48% of samples), with an average scaling 
exponent of (a) = 2.40. However, 15% of models choose 
^min = 4 or 5, and these produce much heavier-tailed 
models, with (a) = 2.21. 

This bimodal distribution in a is caused by slight cur- 
vature in the empirical mid-to-upper tail, which may 
arise from aggregating multiple types of local events into 
a single global distribution (see Appendix IC 3 cl) . The 
algorithm, however, accounts for this curvature by auto- 
matically estimating a slightly wider ensemble of models, 
with correspondingly greater density in the catastrophic 
range. As a result, the estimated probability is larger 
and the confidence intervals wider. Using 10,000 boot- 
straps, we find p = 0.347, with 90% confidence intervals 
of [0.182,0.669], or about a 35% chance over the 1968- 
2007 period. 



C. Alternative tail models 

Comparing these estimates with those derived using 
log-normal and stretched exponential tail models pro- 
vides a check on their robustness, especially if the al- 
ternative models yield dramatically different estimates. 



The mathematical forms of the alternatives are 

log-normal Py(x) oc x _1 exp [—(In a; — /i) 2 /2a 2 ] 

stretched exp. Pr(a;) cx x^~ 1 e~ Xx ' 3 , 

where we restrict each to a "tail" domain 
(see Appendix [3} . In the stretched exponential, f3 < 1 
produces a heavy-tailed distribution; in the log-normal, 
small values of fi and large values of a yield heavy 
tails. Although both decay asymptotically faster than 
any power law, for certain parameter choices, these mod- 
els can track a power law over finite ranges, which may 
yield only marginally lower estimates of large events. 7 

To simplify the comparison between the tail models, 
we fix x m in = 10 and use 10,000 bootstraps for each fit- 
ted alternative tail model. This yields p — 0.112 (CI: 
[0.063,0.172]) for the log-normal and p = 0.187 (CL 
[0.115,0.272]) for the stretched exponential, or roughly 
an 11% and 19% chance, respectively. These values are 
slightly lower than the estimates from the power-law 
model, but they too are consistently away from p = 0, 
which reinforces our conclusion that the size of 9/11 
should not be considered a statistical outlier. 

Figure [Da shows the fitted ensembles for all three fixed- 
x m i n tail models, and Figure [2j) shows the bootstrap dis- 
tributions Pr(p) for these models, as well as the one with 
a^min free. Although the bootstrap distributions for the 
log-normal and stretched exponential are shifted to the 
left relative to the two power-law models, all distributions 
overlap and none place significant weight below p = 0.01. 
The failure of the alternatives to disagree with the power 
law can be attributed to their estimated forms roughly 
tracking the power law's over the empirical data's range, 
which leads to similar probabilistic estimates of a catas- 
trophic event. 

D. Impact of covariates 

Not all large terrorist events are of the same type, and 
thus our overall estimate is a function of the relative em- 



7 The question of power-law versus non-power-law [Tr3l is not al- 
ways academic; for instance, macroeconomic financial models 
have traditionally and erroneously assumed non-power-law tails 
that assign negligible probability to large events like widespread 
sub-prime loan defaults [III . 
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pirical frequency of different covariates and the structure 
of their marginal distributions. Here, we apply our proce- 
dure to the distributions associated with a few illustrative 
categorical event covariates to shed some additional light 
on the factors associated with large events. A general- 
ization to and systematic analysis of arbitrary covariates 
is left for future work. 

For instance, international terrorist events, in which 
the attacker and target are from different countries, com- 
prise 12% of the RAND-MIPT database and exhibit a 
much heavier-tailed distribution, with a = 1.93 ± 0.04 
and i m i n = 1 (see Appendix IC3a[) . This heavier tail 
more than compensates for their scarcity, as we estimate 
p = 0.475 (CI: [0.309,0.610]; Fig. Eh) for at least one 
such catastrophic event from 1968-2007. 8 A similar story 
emerges for events in economically developed nations, 
which comprise 5.3% of our data (see Appendix IC 3 b|) . 
Focusing on large such events (x > 10), we estimate 
p = 0.225 (CI: [0.037, 0.499], Fig. E?>). 

Another important event covariate is the type of 
weapon used. The tails of the weapon-specific distribu- 
tions remain well described as power laws, but weapons 
like guns, knives and explosives exhibit less heavy tails 
(fewer large events) than unconventional weapons [33| . 
even as the former are significantly more common than 
the latter. The estimation algorithm used above can be 
generalized to handle categorical event covariates, and 
produces both marginal and total probability estimates 
(see Appendix IC 3 c|) . Doing so yields an overall estimate 
of p = 0.564 (CI: [0.338,0.839]; Fig. [7]). Examining the 
marginal hazard rates, we see that the largest contribu- 
tion comes from explosives, followed by fire and firearms. 



III. STATISTICAL FORECASTS 

If the social and political processes that generate ter- 
rorist events worldwide are roughly stationary, our al- 
gorithm can be used to make principled statistical fore- 
casts about the future probability of a catastrophic event. 
Although here we make the strong assumption of sta- 
tionarity, this assumption could be relaxed using non- 
stationary forecasting techniques [54-56]. 

A simple forecast requires estimating the number of 
events n expected over the fixed forecasting horizon t. 
Using the RAND-MIPT data as a starting point, we cal- 
culate the number of annual deadly events worldwide 
"■year over the past 10 years. Figure [3] shows the em- 
pirical trend for deadly terrorist events worldwide from 
1998-2007, illustrating a 20-fold increase in n yea r, from 
a low of 180 in 1999 to a high of 3555 in 2006. Much 
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FIG. 3: (upper) Number of deadly (domestic and interna- 
tional) terrorist events worldwide for the ten year period 
1998-2007, and three forecast scenarios, (lower) Fraction of 
events that are severe, killing at least 10 individuals and its 
10-year average (dashed line). 



of the increase is attributable to conflicts in Iraq and 
Afghanistan; excluding events from these countries sig- 
nificantly reduces the increase in n year , with the maxima 
now being 857 deadly events in 2002 and 673 in 2006. 
However, the fraction of events that are severe (x > 10) 
remains constant, averaging (pt&a) = 0.082684 (or about 
8.3%) in the former case and 0.072601 (or about 7.3%) 
in the latter. 

An estimated trend over the next decade could be 
obtained via fitting standard statistical models to an- 
nual data or by soliciting judgements from domain ex- 
perts about specific conflicts. For instance, Iraq and 
Afghanistan may decrease their production rates of new 
events over the next decade, leading n yoar to decrease 
unless other conflicts replace their contributions. Rather 
than make potentially overly specific predictions, we in- 
stead consider three rough scenarios (the future's trajec- 
tory will presumably lay somewhere between): (i) an op- 
timistic scenario, in which the average number of terror- 
ist attacks worldwide per year returns to its 1998-2002 
level, at about (n ycar ) = 400 annual events; (ii) a sta- 
tus quo scenario, where it remains at the 2007 level, at 
about 2000 annual events; and finally (hi) a pessimistic 
scenario, in which it increases to about 10,000 annual 
events. 9 

A quantitative statistical forecast is then obtained by 
applying the estimation algorithm to the historical data 
(now including the 9/11 event) and then generating syn- 
thetic data with the estimated number of future events 



The implication of a larger p for a covariate distribution, as com- 
pared to the full data set, is a smaller p for the excluded types 
of events. That is, a larger p for international events implies a 
smaller p for domestic events. 



Modeling these rough event counts via a Poisson process with 
rate A scenar i would refine our forecasts slightly. More detailed 
event production models could also be used. 
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Pr(a; > 2749) forecast, 


2012-2021 




"optimistic" 


"status quo" 


"pessimistic" 


tail model 


n yoa r ~ 400 


n yea .r » 2000 


n ycar « 10, 000 


power law 


0.117 


0.461 


0.944 


stretched exp. 


0.072 


0.306 


0.823 


log- normal 


0.043 


0.193 


0.643 



TABLE II: Forecast estimates of at least one catastrophic 
event worldwide over a 10 year period, using three tail models 
in each of three scenarios forecast scenarios. 



fttaii- For each scenario, we choose 

^decade 

= 10 

and choose n ta ;i via Eq. ([T]) with p ta ;i = 0.082684 (its 
historical average). Finally, we fix x m i n = 10 to facilitate 
comparison with our alternative tail models. 

Table [XT] summarizes the results, using 100,000 boot- 
straps for each of the three tail models in the three fore- 
cast scenarios. Under the status quo scenario, all three 
models forecast a 19-46% chance of at least one catas- 
trophic event worldwide in the next decade. In the op- 
timistic scenario, with events worldwide being about 5 
times less common, the models forecast a 4-12% chance. 
These estimates depend strongly on the overall frequency 
of terrorist events n yea r- Thus, the greater the popularity 
of terrorism worldwide, i.e., the more often terrorist at- 
tacks are launched, the greater the general likelihood that 
at least one will be catastrophic. Any progress in moving 
the general frequency of terrorism toward the more opti- 
mistic scenario is likely to reduce the overall, near-term 
probability of a catastrophic event. 



IV. IMPROVED ESTIMATES 

Our analysis places the 1968-2007 worldwide historic 
probability of a catastrophic event in the 11-35% range 
(see Table H} and none of the alternative or covariatc 
models provide any support for judging the size of 9/11 
as statistically unlikely. The wide confidence interval 
illustrates the difficulty of obtaining precise estimates 
when accounting for model and parameter uncertainty. 
That being said, our calculations could be further refined 
to improve the overall estimates, incorporate additional 
sources of uncertainty, or address specific questions, by 
relaxing portions of our iid treatment of event severities. 
We discuss several such possibilities here, but leave their 
investigation for the future. 

First, our algorithm assumes a stationary event gen- 
eration process, which is unlikely to be accurate in the 
long term. Technology, population, culture and geopoli- 
tics are believed to exhibit non-stationary dynamics and 
these likely play some role in event severities. Thus, tech- 
niques for statistical forecasting in non-stationary time 
series |54l - l56l | could be used to identify subtle trends in 
the relevant covariates to make more accurate forecasts. 

Second, our algorithm is silent regarding efforts to pre- 
vent events or mitigate their severity [57j . However, the 
historical impact of these processes is implicitly present 



in our empirical data because only events that actually 
occurred were recorded. Thus, our results may be in- 
terpreted as probabilities conditioned on historical pre- 
vention or mitigation efforts. To the extent that policies 
have an impact on incidence and severity, more accurate 
estimates may be achievable by incorporating models of 
policy consequences or interactions between different ac- 
tors. Similarly, our algorithm is silent regarding the ac- 
tors responsible for events, and incorporatin g m odels of 
organizational capabilities, proclivities, etc. |24l l53l . l58j 
may improve the estimates. 

Finally, our approach is non-spatial and says little 
about where the event might occur. Incorporating more 
fine-grained spatial structure, e.g., to make country- level 
or theatre-level estimates [59| (as is now being done in 
seismology [60]), or incorporating tactical information, 
e.g., about specific CBRN attacks, may be possible. Such 
refinements will likely require strong assumptions about 
many context-specific factors [61j , and it remains unclear 
whether accurate estimates at these scales can be made. 
At the worldwide level of our analysis, such contingencies 
appear to play a relatively small role in the global pat- 
tern, perhaps because local-level processes are roughly 
independent. This independence may allow large-scale 
general patterns to emerge from small-scale contingent 
chaos via a Central Limit Theorem averaging process, 
just as regularities in birth rates exist in populations 
despite high contingency for any particular conception. 
How far into this chaos we can venture before losing gen- 
eral predictive power remains unclear [iH . [62j . 



V. DISCUSSION 

In many complex social systems, although large events 
have outsized social significance, their rarity makes them 
difficult to study. Gaining an understanding of such sys- 
tems requires determining if the same or different pro- 
cesses control the appearance of small, common events 
versus large, rare events. A critical scientific problem is 
estimating the true but unknown probability of such large 
events, and deciding whether they should be classified as 
statistical outliers. Accurate estimates can facilitate his- 
torical analysis, model development and statistical fore- 
casts. 

The algorithm described here provides a principled and 
data-driven solution for this problem that naturally in- 
corporates several sources of uncertainty. Conveniently, 
the method captures the tendency of highly-skewed dis- 
tributions to produce large events without reference to 
particular generative mechanisms or strong assumptions 
about the tail's structure. When properly applied, it 
provides an objective estimate of the historical or future 
probability of a rare event, e.g., an event that has oc- 
curred exactly once. 

Using this algorithm to test whether the size of the 
9/11 terrorist events, which were nearly six times larger 
than the next largest event, could be an outlier, we esti- 
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mated the historical probability of observing at least one 
9/1 1-sized event somewhere in the world over the past 40 
years to be 11-35%, depending on the particular choice of 
tail model used to fit the distribution's upper tail. These 
values are much larger than any reasonable definition of 
a statistical anomaly and thus the size of 9/11, which 
was nearly six times larger than the next largest event, 
should not be considered statistically unlikely, given the 
historical record of events of all sizes. 

This conclusion is highly robust. Conditioning on the 
relative frequency of important covariates [33j], such as 
the degree of economic development in the target country, 
whether an event is domestic or international, or the type 
of weapon used, we recover similar estimates, with addi- 
tional nuance. Large events are probabilistically most 
likely to target economically developed nations, be inter- 
national in character and use explosives, arson, firearms 
or unconventional weapons. Although chemical and bio- 
logical events can also be very large Q , historically they 
are rare at all sizes, and this outweighs the heaviness of 
their tail. 

Furthermore, using only event data prior to 9/11 (as 
opposed to using all available data sans 9/11), we find 
a similar overall historical hazard rate. This suggests 
that the worldwide probability for large events has not 
changed dramatically over the past few decades. In con- 
sidering three simple forecast scenarios for the next 10 
years, we find that the probability of another large event 
is comparable to its historical level over the past 40 years. 
This risk seems unlikely to decrease significantly without 
a large reduction in the number of deadly terrorist events 
worldwide. 

Of course, all such estimates are only as accurate as 
their underlying assumptions, and our method treats 
event sizes as iid random variables drawn from a station- 
ary distribution. For complex social phenomena in gen- 
eral, it would be foolish to believe this assumption holds 
in a very strong sense, e.g., at the micro-level or over 
extremely long time scales, and deviations will lower the 
method's overall accuracy. For instance, non-stationary 
processes may lower the global rate of large events faster 
than smaller events, leading to overestimates in the true 
probability of a large event. However, the iid assumption 
appears to be statistically justified at the global spatial 
and long-term temporal scales studied here. Identifying 
the causes of this apparent iid behavior at the global scale 
is an interesting avenue for future work. 

The relatively high probability of a 9/1 1-sized event, 
both historically and in the future, suggests that the 
global political and social processes that generate large 
terrorist events may not be fundamentally different from 
those that generate smaller, more common events. Al- 
though the mechanism for event severities remains un- 
clear [63|, the field of possible explanations should likely 
be narrowed to those that generate events of all sizes. 

Independent of mechanistic questions, the global prob- 
ability of another large terrorist event remains uncom- 
fortably high, a fact that can inform our expectations (as 



with large natural disasters [34h 36] ) of how many such 
events will occur over a long time horizon and how to 
appropriately anticipate or respond to them. This per- 
spective is particularly relevant for terrorism, as classical 
models aimed at predicting event incidence tend to dra- 
matically underestimate event severity [33| . 

To conclude, the heavy-tailed patterns observed in the 
frequency of severe terrorist events suggests that some as- 
pects of this phenomenon, and possibly of other complex 
social phenomena, are not nearly as contingent or unpre- 
dictable as is often assumed. That is, there may be global 
political and social processes that can be effectively de- 
scribed without detailed reference to local conflict dy- 
namics or the strategic tradeoffs among costs, benefits 
and preferences of individual actors. Investigating these 
global patterns offers a complementary approach to the 
traditional rational-actor framework [12j | and a new way 
to understand what regularities exist, why they exist, and 
their implications for long-term stability. 
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Appendix A: Tail models 

The functional form and normalization of the tail 
model should follow the type of empirical data used. For 
instance, if the empirical data are real- valued, the power- 
law tail model has the form 



Pr(y | a, x n 



1 



V 



(Al) 



with a > 1 and y > x min > 0. Given a choice of a; m j n , 
the maximum likelihood estimator for this model is 



a = 1 



y^ln^i/ffmin) 



(A2) 



The severity of a terrorist attack, however, is given by an 
integer. Thus, in our analysis of terrorist event severities, 
we use the discrete form of the power-law distribution 



Pr(y | a, x min ) = y a /C(a, x min ) 



(A3) 



with a > 1 and y > x m - m > 0, and where C(a,x m i n ) = 
Yl^x ■ 1S tne generalized or incomplete zeta func- 
tion. The MLE in for the discrete power law is less 
straightforward, being the solution to the transcenden- 
tal equation 



C iP^I ^min) 
C(^; ^min) 



1 n 

= V X l 

71 * * 



(A4) 
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However, it is straightforward to directly maximize the 
log-likelihood function for the discrete power law in order 
to obtain a: 

n 

C(a) = — n In £(a, x m ; n ) — a lnx, . (A5) 

»=i 

Past work shows that the continuous model given by 
Eq. (|A3|) provides a reasonably good approximation to 
the discrete case when x m i n takes moderate values. In 
our own experiments with this approximation, we find 
that when x m i n > 10 the difference in estimated proba- 
bilities for observing one or more 9/11-sized events be- 
tween using the discrete versus continuous model is at 
most few percent. 

Estimates of x m i n may be obtained using any of sev- 
eral existing automatic methods [13, E^-EB] ■ We use the 
Kolmogorov-Smirnov goodness-of-fit statistic minimiza- 
tion (KS- minimization) technique [l6l |33| . This method 
falls in the general class of distance minimization tech- 
niques for selecting the size of the tail , and was pre- 
viously used to analyzing event severities in global ter- 
rorism. The KS statistic [64[ is the maximum distance 
between the CDFs of the data and the fitted model: 

D = max \S(x) - P{x)\ , (A6) 

X>Xtain 

where S(x) is the CDF of the data for the observations 
with value at least x m j n , and P{x) is the CDF of the 
maximum-likelihood power-law model for the region x > 
x m in- Our estimate x m i n is then the value of x m i n that 
minimizes D. In the event of a tie between several choices 
for x nnn , we choose the smaller value, which improves the 
statistical power of subsequent analyses by choosing the 
larger effective sample size. 

Our alternative tail models are the log-normal and 
the stretched exponential distributions, modified to in- 
clude a truncating parameter x m i n . These distributions 
are normally defined on continuous variables. The struc- 
ture of their extreme upper tails for x m - m = 10, however, 
is close to that of their discrete versions, and the con- 
tinuous models are significantly easier to estimate from 
data. For the results presented in the main text, we used 
the continuous approximation of the upper tails for these 
models. 



Appendix B: Estimator accuracy 

We quantify the expected accuracy of our estimates 
under two experimental regimes in which the true prob- 
ability of a catastrophic event can be calculated analyti- 
cally. 

1. Draw n values iid from a power-law distribution 
with x m ; n = 10 and some a; define x = max^jxi}, 
the largest value within that sample. This step en- 
sures that we treat the synthetic data exactly as we 



treated our empirical data, and provides a particu- 
larly challenging test as the largest generated value 
exhibits the greatest statistical fluctuations. 

2. Draw n — 1 iid values from a power-law distribution 
with x m ; n = 10 and some a, and then add a single 
value of size x whose true probability of appearing 
under the generative model is p = 0.001, i.e., we 
contaminate the dataset with a genuine outlier. 

Figure U shows the results of both experiments, where 
we measure the mean absolute error (MAE) and the 
mean ratio between p and the true p. Even for sam- 
ples as small as n — 40 observations, the absolute error 
is fairly small and decreases with increasing sample size 
n. In the first experiment, the error rate decays like 
0(n -1 / 3 ), approaching 0.01 error rates as n approaches 
5000 (Fig. 0k), while in the second it decays like 0(n _1 ) 
up to about n = 4000, above which the rate of decay 
attenuates slightly (Fig. |4p). 

Absolute deviations may mask dramatic relative er- 
rors, e.g., if the true probability is very close to one or 
zero (as in our contaminated samples experiment). The 
mean ratio of p to p would reveal such mistakes. The 
lower panels in Fig. |4] show that this is not the case: the 
estimation procedure is close both in absolute and in rel- 
ative terms. As the sample size increases, the estimated 
probability converges on the true probability. For con- 
taminated data sets, the p/p can be fairly large when n is 
very small, but for sample sizes of a few hundred obser- 
vations, the method correctly estimates the relative size 
of the outlier's probability. 

Appendix C: Robustness checks 

We present three checks of the robustness of our proba- 
bility estimates, (i) using simple parametric models with- 
out the bootstrap, (ii) using an alternative source of ter- 
rorist event data, and (iii) using event covariates to refine 
the estimates. In each case, we find roughly similar-sized 
estimates. 



1. Estimates using simple models 

A simpler model for estimating the historical probabil- 
ity of a 9/11-sized or larger terrorist event assume (i) a 
stationary generative process for event severities world- 
wide, (ii) event sizes are iid random variables drawn from 
(iii) a power-law distribution that (iv) spans the entire 
range of possible severities (x m i n = 1), and (v) has a 
precisely-known parameter value a — 2.4. 

A version of this model was used in a 2009 Depart- 
ment of Defense-commissioned JASON report on "rare 
events" [j| , which estimated the historical probability of 
a catastrophic (9/11-sized or larger) terrorist event as 
23% over 1968-2006. The report used a slightly erro- 
neous estimate of the power law's normalizing constant, 
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FIG. 4: The mean absolute error {\p — p\) and mean relative error {p/p) — 1 for (a) n values drawn iid from a stationary 
power-law distribution with Xmin = 10 and some a, with the target size being the single largest value in the draw, and for (b) 
n — 1 values to which we add a single outlier (with true p — 0.001). In both experiments, both types of errors are small even 
for fairly small sample sizes, and decay further as n increases. 



a slightly different estimate of a and a smaller value of 
n. Here, we repeat the JASON analysis, but with more 
accurate input values. 

Let q(x) be the probability of observing a catastrophic 
event of size x. With event severities being iid ran- 
dom variables drawn from a fixed distribution Pr(y), the 
generation of catastrophic events can be described by a 
continuous-time Poisson process with rate q(x) |65|. Ap- 
proximating a; as a continuous variable, in a sequence of n 
such events, the probability p of observing at least one of 
catastrophic severity is 

p = l-[l-q(x)] n 

The rate q{x) is simply the value of the complementary 
CDF at x, and for a power-law distribution is given by 

poo 

q{x) = / Pr(y)y 

J X 

poo 

= («-iX7n / y- a y 

J X 

= (^-) la - (C2) 

\ ^rain / 

for x > x m i n . Substituting x m i a — 1 and a = 2.4 
yields the per-event probability of a catastrophic event 
g(2749) = 0.0000153164. 

The RAND-MIPT database records n = 13274 deadly 
events worldwide from 1968-2007; thus, substituting n 
and q(x) into Eq. (|C1[) yields a simple estimate of the 
probability of observing at least one catastrophic event 
over the same time period p = 1 — e~ 13274 9( 2749 ) = 0.184, 
or about 18%. 

However, this calculation underestimates the true 
probability of a large event because the empirical distri- 
bution decays more slowly than a power law with a — 2.4 



at small values of x. Empirically 7.5% of the 13,274 fatal 
events have at least 10 fatalities, but a simple application 
of Eq. (|C2p using x = 10 shows that our model predicts 
that only 4.0% of events should be this severe. Thus, 
events with x > 10 occur empirically almost twice as 
often as expected, which leads to a significant underesti- 
mate of p. 

By restricting the power-law model to the tail of the 
distribution, setting x m i n = 10 and noting that only 
n = 994 events had at least this severity over the 40 year 
period, we can make a more accurate estimate. Repeat- 
ing the analysis above, we find <?(2749) = 0.0000288098 
and p = 0.318, or about a 32% chance of a catastrophic 
event, 10 a value more in line with the estimates derived 
using our bootstrap-based approach in the main text. 



2. Estimates using the Global Terrorism Database 

An alternative source of global terrorism event data is 
the Global Terrorism Database 0], which contains 98,112 
events worldwide from 1970-2007. Of these, 38,318 were 
deadly (x > 0). Some events have fractional severities 
due to having their total fatality count divided evenly 
among multiple event records; we recombined each group 
of fractional-severity events into a single event, yielding 
38,255 deadly events over 38 years. Analyzing the GTD 
data thus provides a check on our results for the RAND- 
MIPT data. 



To make our reported per-event probabilities q(x) consistent 
across models, we report them as q(x) = Pr(X > x \ X > 
£min)P r (^ ^ ^min); i- e -i the probability that a tail event is 
catastrophic times the probability that the event is a tail event. 
These values can be used with Eq. I)C1|I to make rough estimates 
if the corresponding n is the total number of deadly events. 
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FIG. 5: Empirical distribution of event severities from the 
GTD [2| with 100 power-law models, fitted to bootstraps of 
the data. Inset shows the estimated distribution of binomial 
probabilities Pr(p) for one or more catastrophic events. 



The largest event in the GTD is 9/11, with severity 
2763, and the second largest is the 13 April 1994 Rwan- 
dan massacre of Tutsi refugees, with 1180 reported fatal- 
ities. This event is absent from the RAND-MIPT data; 
its inclusion in the GTD highlights this data set's broader 
definition of terrorism, which includes a number of geno- 
cidal or criminal events. 

The best fittingpower-law model obtained using the 
methodology of [16| is a = 2.91 ± 0.22 and x m \ n = 39. 
The p < 0.1 for this model may be attributable to the 
unusually large number of perfectly round-number sever- 
ities in the dataset, e.g., 10, 20, 100, 200, etc., which in- 
dicates rounding effects in the reporting. (These appear 
in Fig. [5] as small discontinuous drops in the complemen- 
tary CDF at round-number locations; true power-law dis- 
tributed data have no preference for round numbers and 
thus their presence is a statistically significant deviation 
from the power-law form.) 

Using the algorithm described in the main text with 
10, 000 bootstraps, we estimate a 38-year probability of 
at least one catastrophic event as p = 0.534 (with 90% 
CI [0.115,0.848]) or about a 53% chance. Repeating our 
analysis using the two alternative tail models yields only 
a modest decrease, as with the RAND-MIPT data. 

Figure [5] shows the empirical fatality distribution along 
with 100 fitted power-law models, illustrating the heavy- 
tailed structure of the GTD severity data. Notably, the 
maximum likelihood estimate for a is larger here (in- 
dicating a less heavy tail) than for the RAND-MIPT 
data. However, the marginal distribution Pr(d) is bi- 
modal, with one mode centered on« = 2.93 and a second 
larger mode centered at roughly a = 2.4, in agreement 
with the RAND-MIPT data. Furthermore, the failure 
of the GTD-estimatcd p to be dramatically lower than 
the one estimated using RAND-MIPT data supports our 
conclusion that the size of 9/11 was not statistically un- 
likely. 



3. Impact of event covariates 

a. International versus domestic, and events prior to 1998 

Events in the RAND-MIPT database with dates be- 
fore 1 January 1998 are mainly international events, i.e., 
the attacker's country of origin differed from the target's 
country. Subsequent to this date, both domestic and 
international events are included but their domestic ver- 
sus international character is not indicated. Analyzing 
events that occurred before this breakpoint thus provides 
a natural robustness check for our overall estimate: (i) 
we can characterize the effect that domestic versus inter- 
national events have on the overall estimate and (ii) we 
can test whether the probability estimates have changed 
significantly in the past decade. 

The pre-1998 events comprise 12% of the RAND-MIPT 
database and exhibit a more heavy-tailed distribution 
(q = 1.92 ± 0.04 and x m in = 1). Using 10,000 boot- 
straps, we estimate p = 0.475 (90% CI: [0.309,0.610]) 
for at least one catastrophic international event over the 
target period. Figure^ shows the empirical distribution 
for international events and the ensemble of fitted mod- 
els, illustrating good visual agreement with the empirical 
distribution. 

The estimate for international-only data (p — 0.475) is 
larger than the estimate derived using the full data set 
(p = 0.347), although these values may not be as different 
as they seem, due to their overlapping confidence inter- 
vals. Fundamentally, the larger estimate is caused by 
the heavier-tailed distribution of the international-only 
data. Because the full data set includes these interna- 
tional events, this result indicates that domestic events 
tend to exhibit a lighter tail, and thus generate large ter- 
rorist events with smaller probability. As a general guide- 
line, subsets of the full data set should be analyzed with 
caution, as their selection is necessarily conditioned. The 
full data set provides the best estimate of large events of 
all types. 

b. Economic development 

A similar story emerges for deadly events in econom- 
ically developed nations, defined here as the member 
countries of the Organisation for Economic Co-operation 
and Development (OECD), as of the end of the period 
covered by the RAND-MIPT data, which are 5.3% of all 
deadly events. The empirical distribution (Fig. [SJd) of 
event severities shows unusual structure, with the upper 
tail (x > 10 fatalities) decaying more slowly than lower 
tail. To handle this oddity, we conduct two tests. 

First, we consider the entire OECD data set, estimat- 
ing both a and x m i n . Using 10,000 bootstraps yields 
p = 0.028 (with 90% CI [0.010,0.053]) or roughly a 3% 
chance over the 40 year period, which is slightly above 
our p = 0.01 cutoff for a statistically unlikely event. Fig- 
ure [SJd shows the resulting ensemble of fitted models, il- 
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FIG. 6: Empirical distributions, with 100 power-law bootstrap models, for (a) international events (events from 1968-1997 in 
the RAND-MIPT database) and (b) events within the OECD nations; dashed lines show the 90% CI on &. Insets show the 
estimated distribution Pr(p) with 90% confidence intervals (shaded area) and overall estimate (dashed line). 



lustrating that the algorithm is placing very little weight 
on the upper tail. Second, we apply the algorithm with a 
fixed x m in = 10 in order to focus explicitly on the distri- 
bution's upper tail. In this case, 10,000 bootstraps yields 
p = 0.225, with 90% CI as [0.037,0.499]. 

c. Type of weapon 

Finally, we consider the impact of the attack's weapon 
type, and we generalize the estimation algorithm to the 
multi-covariate case. Events are classified as (i) chemical 
or biological, (ii) explosives (includes remotely detonated 
devices), (hi) fire, arson and firebombs, (iv) firearms, (v) 
knives and other sharp objects, and (vi) other, unknown 
or unconventional. Given the empirically observed distri- 
butions over these covariates, we would like to know the 
probability of observing at least one catastrophic-sized 
event from any weapon type. 

This requires generalizing our Monte Carlo algorithm: 
Let (x, c)i denote the severity x and categorical covariate 
c for the ith event. Thus, denote the empirical data by 
X = {(x,c) l }. 

1. Generate Y by drawing (y,c)j, j = 1, . . . ,n, uni- 
formly at random, with replacement, from the orig- 
inal data {{x,c)i} (sans the £ catastrophic events). 

(c) 

2. For each covariate type c in Y, jointly estimate x m [ n 
and the tail-model parameters 9^ c \ and compute 
n[% = #{% > 

3. For each covariate type c in Y, generate a synthetic 

(c) 

data set by drawing n t J a random deviates from the 
fitted tail model with parameters 6^ . 

4. If any of the covariate sequences of synthetic events 
includes at least I events of size x or greater, set 
p = 1; otherwise, set it to zero. 

In applying this algorithm to our data, we choose 1=1 
and x = 2749, as with our other analyses. In step 2, 



we again use the KS-minimization technique of [161 ] to 
choose x min and estimate for a power-law tail model 
via maximum likelihood. Finally, as with the univariate 
version of the algorithm, bootstrap confidence intervals 
may be obtained [4l|, both for the general hazard and 
the covariate-specific hazard, by repeating steps 3 and 4 
many times for each bootstrap and tracking the distribu- 
tion of binomial probabilities. 

Using 10,000 bootstraps, drawing 1000 synthetic data 
sets from each, we estimate p = 0.564, with 90% confi- 
dence intervals of Again, this value is well above 
the cutoff for a 9/11-sized attack being statistically un- 
likely. Figures [7^,-f show the ensembles for each weapon- 
specific severity distribution. As a side effect of this 
calculation, we may also calculate the probability that 
a catastrophic event will be generated by a particular 
type of weapon. The following table gives these marginal 
probability estimates, which are greatest for explosives, 
fire, firearms and unconventional weapon types. 

It is emphasized that these are historical estimates, 
based on the relative frequencies of weapon covariates in 
the historical RAND-MIPT data. If the future exhibits 
similar relative frequencies and total number of attacks, 
then they may also be interpreted as future hazards, but 
we urge strong caution in making these assumptions. 



weapon type 


historical p 


90% CI 


chcm. or bio. 


0.023 


[0.000, 0.085] 


explosives 


0.374 


[0.167, 0.766] 


fire 


0.137 


[0.012, 0.339] 


firearms 


0.118 


[0.015, 0.320] 


knives 


0.009 


[0.001, 0.021] 


other or unknown 


0.055 


[0.000, 0.236] 


any 


0.564 


[0.338, 0.839] 



The sum of marginal probabilities exceeds that of the 
"any" column because in some trials, catastrophic events 
are generated in multiple categories. 
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FIG. 7: Empirical distribution, with 100 power-law bootstrap models, for events using (a) chemical or biological, (b) explosives 
(includes remote detonation), (c) fire, arson and firebombs, (d) firearms, (e) knives or sharp objects, and (f) other, unknown 
or unconventional weapons. Insets: marginal distributions of estimated hazard rates Pr(p), with the region of 90% confidence 
shaded and the mean value indicated by the dashed line. 
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